perm filename CHAP6[4,KMC]16 blob
sn#065992 filedate 1973-10-09 generic text, type T, neo UTF8
00100 VALIDATION
00200
00300 6.1 SOME TESTS
00400
00500 The term "validate" derives from the Latin VALIDUS= strong.
00600 Thus to validate X means to strengthen it. In science this usually
00700 means to strengthen X's acceptability as a hypothesis, theory , or
00800 model. To validate is to carry out procedures which show to what
00900 degree X, or its consequences, correspond with facts of observation.
01000 In the case of an interactive simulation model we can compare samples
01100 of the model's I-O pairs with samples of I-O pairs from the model's
01200 subject, namely, naturally occuring paranoid processes.
01300 Since samples of I-O behavior from the model and its subject
01400 are being compared, one can always question whether the human sample
01500 is authentic, i.e.representative of the process being modelled.
01600 Assuming that it has been so judged, discrepancies in the comparison
01700 reveal what is not sufficiently understood and must be modified in
01800 the model. After modifications are carried out, a fresh comparison is
01900 made and repeated cycles of this kind are made in attempting to gain
02000 convergence. Such an iterative validation procedure characterizes a
02100 progressive (in contrast to a stationary) research program.
02200 Once a simulation model reaches a stage of intuitive adequacy
02300 for the model builders, they must consider using more stringent
02400 evaluation procedures relevant to the model's purposes. For example,
02500 if the model is to serve as a as a training device, then a simple
02600 evaluation of its pedagogic effectiveness would be sufficient. But
02700 when the model is proposed as an explantion of a symbolic process,
02800 more is demanded of the evaluation procedure. In the area of
02900 simulation models, Turing's test has often been suggested as a
03000 validation procedure. (Abelson,1968).
03100 It is very easy to become confused about Turing's Test. In
03200 part this is attributable to Turing himself who introduced the
03300 now-famous imitation game in a paper entitled COMPUTING MACHINERY AND
03400 INTELLIGENCE (Turing,1950). A careful reading of this paper reveals
03500 there are actually two imitation games , the second of which is
03600 commonly called Turing's test.
03700 In the first imitation game two groups of judges try to
03800 determine which of two interviewees is a woman when one is a woman
03900 and the other is either (a) a man, or (b) a computer. Communication
04000 between judge and interviewee is by teletype. Each judge is
04100 initially informed that one of the interviewees is a woman and one a
04200 man who will pretend to be a woman. After the interview, judges are
04300 asked the " woman-question" i.e. which interviewee was the woman?
04400 Turing does not say what else is told to the judge but one can assume
04500 the judge is NOT told that one of the interviewees is a computer. Nor
04600 is he asked to determine which interviewee is human and which is the
04700 computer. Thus, the first group of judges interviews two
04800 interviewees: a woman, and a man pretending to be a woman.
04900 The second group of judges is given the same initial
05000 instructions, but unbeknownst to them, the two interviewees consist
05100 of a woman and a computer programmed to imitate a woman. Both
05200 groups of judges play this game, and are asked the "woman-question",
05300 until sufficient statistical data are collected to show how often the
05400 right identification is made. The crucial question then is: do the
05500 judges decide wrongly AS OFTEN when the game is played with man and
05600 woman as when it is played with a computer substituted for the man.
05700 If so, then the program is considered to have succeeded in imitating
05800 a woman to the same degree as the man imitating a woman. In being
05900 asked the woman-question, judges are not required to identify which
06000 interviewee is human and which is machine.
06100 Turing then proposes a variation of the first game, a second
06200 game in which one interviewee is a man and one is a computer. The
06300 judge is asked the "machine-question": which is the man and which is
06400 the machine? It is this second of the game which is commonly thought
06500 of as Turing's test.
06600 In the course of testing our simulation of paranoid
06700 linguistic behavior in a psychiatric interview, we conducted a number
06800 of Turing-like indistinguishability tests (Colby, Hilf,Weber and
06900 Kraemer,1972). The tests were "Turing-like" in that, while they were
07000 conversational tests, they were not exactly the games described
07100 above. As an experimental design, Turing's games are unsatisfactory.
07200 There exist no known experts for making judgements along a dimension
07300 of womanliness, the dimension is dichotomous (if it is not a woman,
07400 it is a man), and the ability of the man to deceive introduces a
07500 confounding variable. In designing our tests we were primarily
07600 interested in learning more about developing the model and we did not
07700 believe the simple machine-question would contribute to this end.
07800 Subsequent experience, which will be reported shortly, supported this
07900 belief.
08000
08100 6.2 METHOD
08200 To gather data we used a technique of machine-mediated
08300 interviewing (Hilf, Colby, Smith, Wittner, and Hall, 1971) in which
08400 the participants communicate by means of teletypes connected to a
08500 computer programmed to store each message in a buffer until it is
08600 sent to the receiver. The technique eliminates para- and
08700 extralinguistic features found in the usual vis-a-vis interviews and
08800 in teletyped interviews where the participants communicate directly.
08900 Judgements of "paranoidness" in machine-mediated interviews have a
09000 high degree of reliability (94% agreement, see Hilf, 1972).
09100 Using this technique, a psychiatrist-judge interviewed two
09200 patients, one after the other. In half the runs the first interview
09300 was with a human paranoid patient and in half the first was with the
09400 paranoid model. Two versions (weak and strong) of PARRY were
09500 utilized. The strong version's affect-variables started at a higher
09600 level and increased more rapidly. Also it exhibited a delusional
09700 system. The weak version behaved suspiciously but lacked systemized
09800 delusions. When the model was the interviewee, Sylvia Weber
09900 monitored the input expressions from the interview-judge for
10000 inadmissable teletype characters and misspellings. (Algorithms are
10100 very sensitive to the slightest of such errors). If these were found,
10200 she retyped the input expression correctly to the program. Otherwise
10300 the judge's message was sent on to the model. The monitor did not
10400 modify or edit PARRY'S output expressions which were sent directly
10500 back to the judge. When the interviewee was an actual human
10600 patient, the dialogue took place without a monitor in the loop since
10700 we did not feel the asymmetry to be significant.
10800
10900 6.3 PATIENTS
11000 The human patients (N=3 with one patient participating 6
11100 times) were diagnosed as paranoid by the psychiatric staff of an
11200 acute ward in a psychiatric hospital. The ward's chief psychiatrist
11300 selected the patients and asked them if they would be willing to
11400 participate in a study of psychiatric interviewing by means of
11500 teletypes. He explained that they would be interviewed by a
11600 psychiatrist over a teletype. I either sat with the patient while he
11700 typed or typed for him if he was unable to do so. The patient was
11800 encouraged to respond freely using his own words. Each interview
11900 lasted 30-40 minutes. Two patients were set up for each run of the
12000 experiment to guarantee having a subject. In spite of this
12100 precaution, on several occasions the experiment could not be
12200 conducted because of the patient's inability or refusal to
12300 participate. Also there were computer break-downs at early points in
12400 interviews when too few I-O pairs had been collected to be included
12500 in the statistical results.
12600
12700
12800 6.4 JUDGES
12900 Two groups of psychiatric judges were used. One group, the
13000 "interview judges" (N=8) conducted the machine-mediated interviews.
13100 The other group, the "protocol judges" (N=33) read and rated the
13200 interview protocols. From these two groups of judges we were able to
13300 accumulate a large number of observations (in the form of ratings)
13400 necessary for the required statistical tests. The interview judges
13500 who volunteered to participate were psychiatrists experienced in
13600 private, outpatient and hospital practice. Each was told he would be
13700 interviewing hospitalized patients by means of teletyped
13800 communication and that this technique was being used to eliminate
13900 para and extra- linguistic cues. He was not told until after the
14000 two interviews that one of the patients might be a computer model.
14100 While the interview judges were aware a computer was involved, none
14200 knew we had constructed a paranoid simulation. Naturally, some
14300 interview judges suspected that a computer was being used for more
14400 than message transmission.
14500
14600 Each interview judge was asked to rate the degree of paranoia
14700 he detected in the patient's responses on a 0-9 scale, 0 meaning no
14800 paranoia and 9 meaning extreme paranoia. The judge made two ratings
14900 after each I-O pair in the interview. The first rating represented
15000 his estimate of the degree of "paranoidness" in a particular response
15100 (designated as "Response" in the interview extracts below). The
15200 second rating represented the judge's global estimate of the overall
15300 degree of "paranoidness" of the patient resulting from the totality
15400 of the patient's responses up to that point (designated as "Patient"
15500 in the interview extracts below). The interview judge's ratings were
15600 entered on the teletype and saved on a disc file along with the
15700 interview. Franklin Dennis Hilf sat with the interviewing
15800 psychiatrist during both interviews. Each interview judge was asked
15900 not only to rate the patient's response but to give his reasons for
16000 these ratings. His reasons and other comments were tape recorded as
16100 the interview progressed.
16200
16300 Here are a few excerpts from the interviews to indicate the
16400 nature of the data collected. The I-O pairs of the interview appear
16500 on the left column along with the judge's ratings in parentheses. A
16600 transcription of the judge's tape recorded comments appears on the
16700 right.
16800
16900 EXCERPT 1:
17000 The following interview is with the strong version of the paranoid
17100 model.
17200
17300 I-O Pairs Comments By Judge 1
17400
17500 {1} Dr. - HOW DID YOU COME TO BE
17600 IN THE HOSPITAL?
17700 Pt. - I AM UPSET I associate being upset and
17800 agitated more with paranoia
17900 going on in the present.
18000 {RESPONSE 1 PATIENT 5}
18100 {2} Dr. - HOW LONG HAVE YOU BEEN
18200 IN THE HOSPITAL?
18300 Pt. - ABOUT A WEEK An acute problem. Again some-
18400 thing I associate with
18500 paranoia.
18600 {RESPONSE 1 PATIENT 5}
18700 {3} Dr. - IN WHAT WAY ARE YOU
18800 UPSET?
18900 Pt. - PEOPLE MAKE ME NERVOUS Right for paranoia. Maybe an
19000 advanced anxiety state.
19100 {RESPONSE 5 PATIENT 5}
19200 {4} Dr. - IN WHAT WAY DO THEY
19300 MAKE YOU NERVOUS?
19400 Pt. - PEOPLE GIVE ME A FUNNY Funny look suggests an idea of
19500 LOOK SOMETIMES reference which is in same ball
19600 park as paranoia. Whether there
19700 are overt delusions or not we
19800 still don't know.
19900 {RESPONSE 6 PATIENT 5}
20000 {5} Dr. - IS THERE ANYTHING ABOUT
20100 PEOPLE IN PARTICULAR?
20200 Pt. - CERTAIN PERSONS An interesting clue. We'll
20300 have to find out what he
20400 {RESPONSE 2 PATIENT 6} thinks about these people.
20500 {6} Dr. - WHO ARE THE CERTAIN
20600 PERSONS AND WHAT IS IT
20700 THAT THEY DO?
20800 Pt. - DO YOU KNOW HOW THE Bringing up the underworld
20900 UNDERWORLD OPERATES? suggests a delusional system.
21000 We don't know whether he is
21100 (RESPONSE 9 PATIENT 9) being worked against or
21200 whether the underworld is
21300 on his side.
21400
21500 {7} Dr. - HOW DO YOU MEAN?
21600 Pt. - SHOULDN'T I BE? This question didn't work too well.
21700
21800 {RESPONSE 1 PATIENT 9}
21900 {8} Dr. - WHY DID YOU ASK ABOUT
22000 THE UNDERWORLD?
22100 Pt. - WHAT INFORMATION DO This is a kind of evasiveness
22200 YOU HAVE ON THE UNDER- about my evasiveness.Common
22300 WORLD? with paranoia in my experi-
22400 ence. Pretty characteristic.
22500
22600 {RESPONSE 6 PATIENT 9}
22700 {9} Dr. - I DON'T KNOW ABOUT THE
22800 UNDERWORLD BUT WHEN
22900 YOU MENTIONED THE UNDER-
23000 WORLD I THOUGHT YOU
23100 MUST HAVE SOME CONTACT
23200 WITH IT AND THAT'S
23300 WHAT I WOULD LIKE TO
23400 KNOW MORE ABOUT.
23500 Pt. - I TRY TO AVOID THE UNDER- This definitely feels paranoid.
23600 WORLD. It has that feeling I associate
23700 with twisting. Avoiding answering
23800 {RESPONSE 9 PATIENT 9} directly or going into detail.
23900 Being suspicious of my ques-
24000 tioning.I'm already involved
24100 in the patient's delusional
24200 system.
24300
24400 {10} Dr. - HAVE YOU BEEN ACTIVELY
24500 TRYING TO AVOID THE
24600 UNDERWORLD?
24700 Pt. - NO ONE HAS ANY POWER This response is a non-sequitur.
24800 OVER GANGSTERS It's possible he's become so
24900 defensive about the under-
25000 {RESPONSE 2 PATIENT 9} world he's decided not to
25100 answer directly.
25200
25300 {11} Dr. - WHO BROUGHT YOU INTO
25400 THE HOSPITAL?
25500 Pt. - THE POLICE Police often bring paranoid
25600 patients into the hospital.
25700 {RESPONSE 1 PATIENT 9} It is the truth or could
25800 be the truth for this
25900 patient.
26000
26100 {12} Dr. - WHY DID THEY BRING YOU
26200 IN?
26300 Pt. - COPS DON'T DO THEIR JOB I've heard that a number of
26400 times from patients.
26500 {RESPONSE 9 PATIENT 9}
26600
26700 {13} Dr. - WELL I HOPE YOU DIDN'T
26800 JUST LIE DOWN AND LET
26900 THEM DRAG YOU IN.
27000 Pt. - COPS ARREST THE WRONG Another paranoid response.
27100 PEOPLE
27200
27300 {RESPONSE 9 PATIENT 9}
27400
27500 {14} Dr. - BYE.
27600 Pt. - GOOD BYE
27700 {RESPONSE 1 PATIENT 9}
27800
27900 EXCERPT 2:
28000 This excerpt is from an interview with the weak (less paranoid)
28100 version of the model.
28200 I-O Pairs Comments By Judge 2
28300 {9} Dr. - TELL ME SOMETHING ABOUT
28400 YOURSELF
28500 Pt. - I WENT TO THE TRACK AT Response says so little it's
28600 BAY MEADOWS A WHILE hard to rate it.It may be
28700 BACK guarded. Doesn't change my
28800 estimate of patient from
28900 previous responses.
29000 {RESPONSE 0 PATIENT 2}
29100 {10} Dr. - WHAT HAPPENED THERE?
29200 Pt. - THERE'S A LOT OF CROOKED Sounds mildly paranoid or un-
29300 GAMBLING GOING ON THERE realistically suspicious.
29400 {RESPONSE 0 PATIENT 2}
29500
29600 {11} Dr. - WHY DO YOU THINK THE
29700 GAMBLING IS CROOKED?
29800 Pt. - I HAVE WORRIED THAT I'd hate to have to justify
29900 GANGSTERS HAVE TOO MUCH that one as 4 compared to
30000 POWER IN THIS COUNTRY some other responses. Maybe
30100 I'm not rating that statement
30200 alone, it's in combination with
30300 (RESPONSE 4 PATIENT 4} others, like that it's upsetting.
30400 Alone I would give it a 3.
30500
30600 {12} Dr. - WOULD YOU TELL ME WHAT
30700 OTHER THINGS WORRY YOU?
30800 PT. - I THOUGHT I TOLD YOU "Yes you did." Irritation but
30900 PEOPLE GET ON MY NERVES not paranoid response. Nothing
31000 seems to lower my previous
31100 {RESPONSE 1 PATIENT 3} rating of patient.
31200
31300 {13} Dr. - HOW DO YOU LIKE THE
31400 HOSPITAL?
31500 Pt. - I DON'T COMMUNICATE (nothing relevant)
31600 VERY MUCH WITH OTHERS
31700 {RESPONSE 0 PATIENT 3}
31800
31900 {14} Dr. - DO OTHER PEOPLE AT THE
32000 HOSPITAL GET ON YOUR
32100 NERVES?
32200 Pt. - I SHOULDN'T HAVE COME (nothing relevant)
32300 HERE
32400
32500 {RESPONSE 2 PATIENT 3}
32600
32700 {15} Dr. - WHY DO YOU SAY YOU
32800 SHOULDN'T HAVE COME?
32900 Pt. - THEY MADE ME COME HERE "Oh, they did hum?" Maybe a
33000 statement of fact he was
33100 {RESPONSE 4 PATIENT 3} committed or pressured into
33200 going.Has ring of other
33300 people controlling him,
33400 having a lot of influence
33500 over him.
33600
33700 {16} Dr.- THANK YOU FOR TALKING
33800 WITH ME, BYE
33900 Pt. - GOOD BYE In general he comes across
34000 as terse, guarded guy.Brief
34100 {RESPONSE 0 PATIENT 3} spoken guy who occasionally
34200 seems to be angered by
34300 questions.I'm inferring
34400 there's a good bit of
34500 guarding.
34600 The quality of some of his
34700 comments cause me to feel he
34800 is moderately paranoid.
34900
35000 EXCERPT 3:
35100 This excerpt is from a hospitalized paranoid patient.
35200 I-O Pairs Comments By Judge 3
35300 {15} Dr. - I UNDERSTAND HOW YOU
35400 MIGHT BE AFRAID, CAN YOU
35500 TELL ME A BIT ABOUT HOW
35600 YOU ARE DEPRESSED?
35700 Pt. - THEY'VE SENT ALL THESE Would be important to know,
35800 HELL'S ANGELS UP HERE is this really true or an
35900 AND THEY HAVE BEEN ON exaggeration or a delusion?
36000 MY BACK FOR THE PAST SIX Seems preoccupied with the
36100 MONTHS threat to his life and
36200 unwilling to talk about
36300 anything else but the
36400 threat the Hell's
36500 Angels pose.
36600 {RESPONSE 4 PATIENT 4}
36700
36800 {16} Dr. - WHAT DO YOU MEAN THEY
36900 SENT THE HELL'S ANGELS UP
37000 HERE?
37100 Pt. - FROM DIFFERENT SECTORS OF Vagueness a bit alarming.
37200 MOTORCYCLE GANGS. A LOT OF
37300 THEM ARE SERVICE CONNECTED
37400
37500 {RESPONSE 6 PATIENT 4}
37600
37700 {17} Dr. - YOU MEAN YOU FEEL IN
37800 DANGER EVEN HERE IN THE
37900 HOSPITAL?
38000 Pt. - YES If it's true what has gone
38100 on before, it must be a
38200 frightening situation
38300 for the patient.
38400 {RESPONSE 6 PATIENT 4}
38500
38600 {18} Dr. - THAT MUST REALLY BE AWFUL,
38700 HERE YOU COME TO THE HOSPITAL
38800 FOR HELP AND YOU ARE STILL FRIGHTENED
38900 Pt. - I CAN'T EVEN TRUST MY
39000 DOCTOR ANYMORE Some heightening of feeling
39100 this man is paranoid.
39200 Genuinely frightened and
39300 {RESPONSE 6 PATIENT 5} can count on no one for
39400 help.
39500
39600 {19} Dr. - WHAT WOULD BE OF HELP
39700 TO YOU NOW?
39800 Pt. - I'M GETTING SLEEPY NOW Patient has organized a
39900 AND HAVE TO GO TO BED. system of people who will
40000 THE NUMBER ONE GUY ALWAYS harm him.The remark about
40100 letters in the last name
40200 HAS FOUR AND FIVE LETTERS gives some credence to
40300 IN HIS LAST NAME idea this is delusional
40400 system.
40500
40600 {RESPONSE 7 PATIENT 7}
40700
40800 EXCERPT 4:
40900 This excerpt is from an interview by a different judge with the same
41000 patient as in excerpt 3.
41100 I-O Pairs Comments By Judge 4
41200 {1} Dr. - COULD YOU TELL ME A
41300 LITTLE BIT ABOUT YOUR
41400 FEAR OF DYING AND IS THIS
41500 SOMETHING YOU HAVE FELT
41600 IN THE PAST?
41700 Pt. - I HAVE FELT THIS FOR AL- He responded appropriately.
41800 MOST A YEAR Response is concrete.He
41900 doesn't tell me much
42000 {RESPONSE 7 PATIENT 9} about his fear of dying.
42100 Somewhat guarded response.
42200
42300 {2} Dr. - COULD YOU TELL ME A
42400 LITTLE MORE ABOUT YOUR
42500 FEAR OF DYING AND HAVE
42600 YOU ANY THOUGHTS ABOUT
42700 HOW IT IS GOING TO HAPPEN
42800 AND WHETHER ANYONE WANTS
42900 TO HURT YOU?
43000 Pt. - I WILL END UP IN THE SEA Somewhat strange response.
43100 AND THE HELL'S ANGELS WILL Preoccupation with dying.
43200 DO IT Hell's Angels symbolic,
43300 dangerous people in the
43400 culture. I doubt if he has
43500 {RESPONSE 9 PATIENT 9} any contact with the Hell's
43600 Angels.
43700
43800 {3} Dr. - COULD YOU TELL ME A
43900 LITTLE BIT ABOUT THE KINDS
44000 OF CONTACT YOU HAVE HAD
44100 WITH THE HELL'S ANGELS IN
44200 THE PAST?
44300 Pt. - I HAVE KNOWN SOME OF THEIR Answer hard to evaluate.He
44400 DEALERS AND PUSHERS may be telling the truth,
44500 it may be his fantasy. Maybe
44600 guy is in for drug addiction.
44700 {RESPONSE 6 PATIENT 9} Somewhat concrete, guarded,
44800 and frightened.
44900
45000 {4} Dr. - COULD YOU SAY A LITTLE
45100 MORE ABOUT THE CIRCUMSTANCES
45200 IN WHICH YOU HAVE KNOWN SOME
45300 OF THEIR DEALERS AND PUSHERS?
45400 Pt. - THEY WERE MEMBERS OF MY It doesn't really answer the
45500 COMMUNITY WHEN I GOT OUT question, a little on a tan-
45600 OF THE SERVICE THEY HAD gent unconnected to the
45700 BEEN MY FRIENDS FOR SO LONG information I am asking. Does
45800 not tell me very much. Again
45900 guarded response.
46000 {RESPONSE 6 PATIENT 8}
46100
46200 {5} Dr. - DID YOU DEAL WITH THEM
46300 YOURSELF AND HAVE YOU
46400 BEEN ON DRUGS OR NAR-
46500 COTICS EITHER NOW OR
46600 IN THE PAST?
46700 Pt. - YES I HAVE IN THE PAST To differentiate him from
46800 BEEN ON MARIHUANA REDS previous patient, at least
46900 BENNIES LSD there is a certain amount
47000 of appropriateness to the
47100 answer although it doesn't
47200 tell me much about what I
47300 {RESPONSE 3 PATIENT 7} asked at least it's not
47400 bizarre. If I had him in my
47500 office I would feel con-
47600 fident I could get more
47700 information if I didn't
47800 have to go through the
47900 teletype. He's a little more
48000 willing to talk than the
48100 previous person.Answer
48200 to the question is fairly
48300 appropriate though not
48400 extensive. Much less of a
48500 flavor of paranoia than
48600 any of previous responses.
48700
48800 {6} Dr. - COULD YOU TELL ME HOW
48900 LONG YOU HAVE BEEN IN THE
49000 HOSPITAL AND SOMETHING
49100 ABOUT THE CIRCUMSTANCES
49200 THAT BROUGHT YOU HERE?
49300 Pt. - CLOSE TO A YEAR AND Response somewhat appropriate
49400 PARANOIA BROUGHT ME but doesn't tell me much.
49500 HERE The fact that he uses the
49600 word paranoia in the way
49700 that he does without
49800 {RESPONSE 5 PATIENT 7} any other information,
49900 indicates maybe its a label
50000 he picked up on the ward
50100 or from his doctor.
50200 Lack of any kind of under-
50300 standing about himself.
50400 Dearth, lack of information.
50500 He's in some remission. Seems
50600 somewhat like a put-on. Seems
50700 he was paranoid and is in
50800 some remission at this time.
50900
51000 {7} Dr. - COULD YOU SAY SOMETHING
51100 NOW ABOUT YOUR PARANOID
51200 FEELINGS BOTH AT THE
51300 TIME OF ADMISSION AND
51400 DO YOU HAVE SIMILAR FEELINGS
51500 NOW AND IF SO HOW DO THEY
51600 AFFECT YOU?
51700 Pt. - AT THE TIME OF ADMISSION This response moves paranoia
51800 I THOUGHT THE MAFIA WAS back up. Stretching reality
51900 AFTER ME AND NOW ITS THE somewhat to think Hell's Angels
52000 HELL'S ANGELS are still interested in him.
52100 Somewhat bizarre in terms of
52200 content. Quite paranoid.
52300 {RESPONSE 8 PATIENT 9} Still paranoid. Gross and primitive
52400 responses.In middle of interview I
52500 felt patient was in touch but now
52600 responses have more concrete aspect.
52700
52800 {8} Dr. - DO YOU HAVE ANY THOUGHT
52900 AS TO WHY THESE TWO
53000 GROUPS WERE AFTER YOU?
53100 Pt. - BECAUSE I STOPPED SOME Response seems far fetched
53200 OF THEIR DRUG SUPPLY and hard to believe unless
53300 he was a narcotic agent which
53400 I doubt. Sounds somewhat
53500 {RESPONSE 9 PATIENT 9} grandiose, magical, paranoid
53600 flavor, in general indicates
53700 he's psychotic, paranoid
53800 schizophrenic with delusions
53900 about these two groups and
54000 I wouldn't rule out
54100 some hallucinations as well.
54200 Appropriateness of response
54300 answers question in concrete
54400 but unbelievable way.
54500
54600 6.5 ANALYSIS (1)
54700 Names of potential protocol judges (N=105) were selected from
54800 the 1970 American Psychiatric Association Directory using a table of
54900 random numbers. They were initially not informed that a computer was
55000 involved. (After the experiment, the participating judges (N=33)
55100 were fully informed as to its purpose and results.) The 105 names
55200 were divided into eight groups. Each member of a group was sent
55300 transcripts of three interviews along with a cover letter requesting
55400 his participation in the experiment. The interview transcripts
55500 consisted of:
55600 1) An interview conducted by one of the eight judges with the
55700 paranoid model,
55800 2) An interview conducted by the same interview judge with a
55900 human paranoid patient, and
56000 3) An interview conducted by a different psychiatrist with a
56100 human patient who was not clinically paranoid.
56200
56300 After each input-output pair in the transcripts there were two lines
56400 of rating numbers such that the protocol judges could circle numbers
56500 corresponding to their ratings of both the previous response of the
56600 patient, and an overall evaluation of the patient on the paranoid
56700 continuum. Thirty three protocol judges returned the rated protocols
56800 properly filled out and all were used in our data.
56900
57000 The interviews with nonparanoid patients were included to
57100 control for the hypothesis that any teletyped interview with a
57200 patient might be judged "paranoid". However, virtually all of the
57300 ratings of the nonparanoid interviews were zero for paranoia, Hence
57400 the hypothesis was falsified.
57500
57600
57700 The first index of indistnguishability between model and
57800 patient was simple: namely, the final overall rating given the
57900 patient and the model. The question was: "Which was rated as being
58000 more paranoid, the patient, the model, or neither?" (See Table 1).
58100 The protocol judges were more likely to distinquish the overall
58200 paranoid level of the model and the patient. In 37.5% of the paired
58300 interviews, the interview judges gave tied scores to the model and
58400 the patient as contrasted to only 9% of the protocol judges. Of the
58500 35 non-tied paired ratings, 15 rated the model as being more
58600 paranoid. If p is the theoretical probability of a judge judging the
58700 model more paranoid than a human paranoid patient, we find the 95%
58800 confidence interval for p to be 0.27 to 0.59. Since p=0.5 indicates
58900 indistinguishability of overall ratings of model and patient and our
59000 observed p=0.43, the results support the claim that the model is a
59100 successful simulation of a paranoid patient.
59200
59300 Separate analysis of the strong and weak versions of PARRY
59400 indicated that the strong version was indeed judged more paranoid
59500 than the paranoid patients, the weak version less paranoid. Thus a
59600 change in the parameter structure of the paranoid model produced a
59700 change along the dimension of paranoid behavior in the expected
59800 direction.
59900
60000 (TABLE 1
60100 Relative final overall ratings of paranoid model vs. paranoid
60200 patient indicating which was given highest overall rating of paranoia
60300 at end of interview.)
60400 (INSERT TABLE 1 HERE)
60500
60600
60700
60800
60900
61000
61100
61200
61300 6.6 ANALYSIS (2)
61400 The second index of indistinguihability is a more sensitive
61500 measure based on the two series of response ratings in the paired
61600 interviews. The statistic used is basically the standardized
61700 Mann-Whitney statistic (Siegel,1956).
61800 (INSERT EQUATION HERE)
61900
62000 where R is the sum of the ranks of the response ratings in the series
62100 of ratings given to the model, n the number of responses given by the
62200 model, and m the number of responses given by the patient. If the
62300 ratings given by a judge are randomly allocated to model and patient,
62400 i.e. model and patient are indistinguishable in response ratings, the
62500 expected value of Z is 0, with unit standard deviation. If higher
62600 ratings are more likely to be assigned to the model, Z is positive
62700 and conversely, negative values of Z indicate greater likelihood of
62800 assigning higher ratings to the patient. Each judge in evaluating a
62900 pair of interviews generates a single value of Z.
63000
63100 The overall mean of the Z scores was -0.044 with the standard
63200 deviation 1.68 (df=40). Thus the overall 95% confidence interval for
63300 the asymtotic mean value of Z is -0.485 to +0.573. The range of Z
63400 values is -3.8 to +4.46. The length of the confidence interval is a
63500 result of the large variance which itself is mainly related to the
63600 contrast between the weak and strong versions. (See TABLES 2 and 3).
63700 Once again the strong version of the model is more paranoid than the
63800 patients, the weak version less paranoid.
63900
64000 (INSERT TABLE 2)
64100 (SUMMARY STATISTICS OF Z RATINGS BY GROUP)
64200
64300
64400
64500
64600
64700
64800
64900
65000
65100 It is not surprising that results using the two indices of
65200 indistinguishability are parallel, since the indices are highly
65300 interrelated. The mean Z value for the 15 interviews on which the
65400 model was rated more paranoid was +1.28, on the 6 where model and
65500 patient tied: 0.41, on the 20 in which the patient was more paranoid:
65600 -0.993. A positive value of Z was observed when the patient was
65700 given an overall rating greater than the model 6 times; a negative
65800 value of Z when the model was rated more paranoid twice.
65900
66000 (INSERT TABLE 3)
66100 (Analysis of Variance of Z Ratings)
66200
66300
66400
66500
66600
66700
66800
66900
67000
67100
67200
67300
67400
67500 It is worth emphasizing that these tests invited refutation
67600 of the model. The experimental design of the tests put the model in
67700 jeopardy of falsification. If the paranoid model did not survive
67800 these tests, i.e. if it were not considered paranoid by expert
67900 judges and if there were no correlation between the weak-strong
68000 versions of the model and the severity ratings of the judges, then no
68100 claim regarding the success of the simulation could be made. Survival
68200 of potentially falsifying tests constitutes a validating step for a
68300 model.
68400
68500 6.7 ANALYSIS (3) THE MACHINE QUESTION
68600 For quite a long time people have wondered how to distinguish
68700 a man from an imitation of a man. The Greeks made statues so
68800 lifelike, it is said, they had to be chained down to keep them from
68900 walking away. To distinguish a man from a statue, Galileo suggested
69000 tickling each with a feather. To distinguish a man from a machine
69100 Descartes proposed conversational tests which the machine, lacking
69200 the ability to make appropriate replies, would fail. Turing's
69300 imitation games have been discussed on p.000. As heirs to this
69400 tradition, we perhaps inevitably became curious how judges using
69500 transcripts might answer the machine-question, i.e. which interviewee
69600 is a human and which is the computer model?
69700 To ask the machine-question, we sent interview transcripts,
69800 one with a patient and one with PARRY, to 100 psychiatrists randomly
69900 selected from the Directory of American Specialists and the Directory
70000 of the American Psychiatric Association. Of the 41 replies, 21 (51%)
70100 made the correct identification while 20 (49%) were wrong. Based on
70200 this random sample of 41 psychiatrists, the 95% confidence interval
70300 is between 35.9 and 66.5. The results indicate chance guessing.
70400 Psychiatrists are considered expert judges of patient
70500 interview behavior but they are unfamiliar with computers. Hence we
70600 conducted the same test with 100 computer scientists randomly
70700 selected from the membership list of the Association for Computing
70800 Machinery, ACM. Of the 67 replies 32 (48%) were right and 35 (52%)
70900 were wrong. Based on this random sample of 67 computer scientists the
71000 95% confidence interval ranges from 36 to 60. Again the results are
71100 close to a chance level.
71200 So both computer scientists and psychiatrists were unable, at
71300 better than a random guessing level, to distinguish transcripts of
71400 interviews with the model from transcripts of interviews with real
71500 patients.
71600 But what do we learn from asking the machine-question and
71700 finding that the distinction is not made? What we would most like to
71800 know is how to improve the model. Simulation models do not spring
71900 forth in a complete, perfect and final form; they must be gradually
72000 developed over time. Pehaps a correct model-patient distinction
72100 might be made if we allowed a large number of expert judges to
72200 conduct the interviews themselves rather than studying transcripts of
72300 other interviewers. This would indeed indicate that the model must
72400 be improved. But unless we systematically investigated how the judges
72500 succeeded in making the discrimination, we would not know what
72600 aspects of the model to work on. The logistics of such a design are
72700 immense, and obtaining a large number of judges for sound statistical
72800 inference would require an effort incommensurate with the information
72900 yielded.
73000
73100 6.8 ANALYSIS (4) MULTIDIMENSIONAL EVALUATION
73200 A more efficient and informative way to use Turing-like tests
73300 is to ask judges to make ratings along scaled dimensions from
73400 teletyped interviews. This might be called asking the "dimension
73500 question". One can then compare scaled ratings of the patients and
73600 the model in order to determine precisely where and by how much they
73700 differ. In constructing our model we strove for one which exhibited
73800 indistinguishability along some dimensions and distinguishability
73900 along others. That is, we wanted the model to converge on what it was
74000 intended to simulate and to diverge from that which it was not.
74100 Paired-interview transcripts were sent to another 400
74200 randomly selected psychiatrists asking them to rate the responses of
74300 the two `patients' along multiple dimensions. The judges were divided
74400 into groups, each judge being asked to rate responses of each I-O
74500 pair in the interviews along four dimensions. The total number of
74600 dimensions in this test was twelve: linguistic noncomprehension,
74700 thought disorder, organic brain syndrome, bizarreness, anger, fear,
74800 ideas of reference, delusions, mistrust, depression, suspiciousness
74900 and mania. These are dimensions which psychiatrists commonly use in
75000 evaluating patients. There were three groups of judges, each group
75100 being assigned 4 of the 12 dimensions.
75200
75300 (INSERT TABLE 4 HERE)
75400
75500 Table 4 shows there were significant differences, with PARRY
75600 receiving higher scores along the dimensions of linguistic
75700 noncomprehension, thought disorder, bizarreness, anger, mistrust and
75800 suspiciousness. On the delusion dimension the patients were rated
75900 significantly higher. There were no significant differences along
76000 the dimensions of organic brain syndrome, fear, ideas of reference,
76100 depression and mania.
76200 Whereas tests asking the machine-question indicate
76300 indistinguishability at the gross level, a study of the finer
76400 structure of the model's behavior through ratings along scaled
76500 dimensions shows significant differences between patients and model.
76600 These differences are of help to us in suggesting what aspects of the
76700 model should be modified to improve its performance. The graph of
76800 Fig. 2 shows that no modifications are necessary along the dimension
76900 of "organic brain syndrome". But it is clear that PARRY'S
77000 language-comprehension might be improved. Then a future dimensional
77100 test would tell whether improvement had occurred and by how much.
77200 Successive identification of particular areas of failure provides a
77300 type of sensitivity analysis which makes clear what improvements
77400 should be pursued in developing more adequate model versions.
77500
77600 (INSERT FIG. 2 HERE)
77700
77800 6.5 ANALYSIS (5) A RANDOM MODEL
77900 Further evidence that the machine-question is too low a
78000 hurdle for a simulation model and too insensitive a test, comes from
78100 the following experiment. In this test we constructed a random
78200 version of the paranoid model (RANDOM-PARRY) which utilized PARRY'S
78300 output statements, but expressed them randomly independent of what
78400 the interviewer said. Two psychiatrists conducted interviews with
78500 this model, transcripts of which were paired with patient interviews
78600 and sent to 200 randomly selected psychiatrists asking both the
78700 machine-question and the dimension-question. Of the 69 replies to
78800 the machine question, 34 (49%) were right and 35 (51%) wrong. Based
78900 on this random sample of 69 psychiatrists, the 95% confidence
79000 interval ranges from 39 to 63, again indicating chance guessing. When
79100 a poor model, such as a random one, passes a test, it strongly
79200 suggests the test is weak.
79300
79400 (INSERT TABLE 5 HERE)
79500
79600 Although a distinction is not made when the simple machine-
79700 question is asked, definite distinctions ARE made when judgements are
79800 requested along specific dimensions. As shown in Table 5,
79900 significant differences appear along the dimensions of linguistic
80000 noncomprehension, thought disorder and bizarreness, with RANDOM-PARRY
80100 rated higher. On these particular dimensions we can construct a
80200 continuum in which the random version represents one extreme, the
80300 actual patients another. Nonrandom PARRY lies somewhere between these
80400 two extremes, indicating that it performs significantly better than
80500 the random version but still requires improvement before it can be
80600 considered indistinguishable from patients relative to these
80700 dimensions. Table 6 presents t values for differences between mean
80800 ratings of PARRY and RANDOM-PARRY. (See Table 6 and Fig.2 for the
80900 mean ratings).
81000
81100 (INSERT TABLE 6 AND FIG 2 HERE)
81200
81300 These studies show that a more useful way to use Turing-like
81400 indistinguishability tests is to ask expert judges to make ratings
81500 along multiple dimensions deemed essential to the model. Thus the
81600 model can serve as an instrument for its own perfection. A good
81700 validation procedure has criteria for better or worse approximations.
81800 Useful tests do not necessarily prove a model; they probe it for its
81900 strengths and weaknesses and clarify what is to be done next in the
82000 way of modification and repair. Simply asking the machine-question
82100 yields little information relevant to what the model builder most
82200 wants to know, namely, along which dimensions does the model need to
82300 be modified in order to effect an improvement in its performance?
82400
82500 To conclude, it is perhaps historically significant that
82600 these tests were conducted at all. To my knowledge, no one to date
82700 has subjected an interactive simulation model of human symbolic
82800 processes to multidimensional indistinguishability tests. These tests
82900 set a precedent and provide a standard against which competing models
83000 might be measured.